Libraries used in the analysis

library(ggplot2)
library(knitr)
library(dplyr)
library(org.Mm.eg.db)
library(DT)
library(GeneOverlap)
library(GO.db)
library(KEGGREST)
library(KEGG.db)
library(fgsea)
library(maftools)
library(pheatmap)
library(ensembldb)
library(stringr)
library(ensembldb)
library(BSgenome.Mmusculus.GENCODE.GRCm38.102)
library(biomaRt)
library(ggpubr)
library(DESeq2)
library(pheatmap)
library(ComplexHeatmap)

Loading in the necessary data

PD1 vs Combo (PD1/Combo)

Differential Expression Data

RSEM was run on each sample. Using the RSEM-quantified gene expression, DESeq2 was run between arms. The fold change order is as follows: A_vs_B => numerator_vs_denominator. The following table is interactive and shows the differential expression results for PD1 vs combination therapy samples. The magnitude log2 fold change is very small towards PD1 and towards Combo.

Volcano Plots

gsea results on GO and KEGG terms

Gene-set enrichment analysis within the GO pathways show enrichment for ribosome pathways, protein synthesis and catalysis, innate and adaptive immune response, B-cell proliferation, an antiviral immune response, interferon production, cellular response to interferon, RNA translation, and RNA and mRNA binding.

GO GSEA

KEGG GSEA

HALLMARK GSEA

Untreated vs Combo (Untreated/Combo)

Differential Expression Data

Volcano Plots

gsea results on GO and KEGG terms

GO GSEA

KEGG GSEA

HALLMARK Pathway GSEA

Untreated vs PD1 (Untreated/PD1)

Differential Expression Data

Volcano Plots

gsea results on GO,KEGG, and Hallmark Pathway terms

GO GSEA

KEGG GSEA

HALLMARK Pathway GSEA

Creating conglomerate gene and junction expression per sample

## combining and normalizing the gene expression matrix per sample

gene_expression_file <- read.table("/media/theron/My_Passport/Ali_data/HTMBmice_RNAseq_Data/MB01JHU504/MB01JHU504_000_analysis/RSEM/tables/genes/filenames.txt")
for (i in seq(nrow(gene_expression_file))){
  file<-gene_expression_file[i,]
  sample <- str_replace(basename(file),"_rsem_mm10.genes.results","")
  if (i == 1){
    expr_fill <- read.table(file,header=T)
    genes <- expr_fill$gene_id
    gene_expression<-data.frame(genes)
    gene_expression[,sample] <- expr_fill$expected_count
  } else {
    expr_fill <- read.table(file,header=T)
    gene_expression[,sample] <- expr_fill$expected_count
  }
}
rownames(gene_expression)<-gene_expression$genes
gene_expression<-gene_expression[,seq(2,ncol(gene_expression))]
gene_expression <- t(apply(gene_expression,1,round))
gene_expression_vst <- as.data.frame(varianceStabilizingTransformation(gene_expression))
gene_expression_rlog <- as.data.frame(rlog(gene_expression))

saveRDS(gene_expression_vst,file="/media/theron/My_Passport/Ali_data/HTMBmice_RNAseq_Data/MB01JHU504/MB01JHU504_000_analysis/RSEM/tables/genes/gene_expression_vst.rds")
gene_expression <- as.data.frame(gene_expression)
saveRDS("/media/theron/My_Passport/Ali_data/HTMBmice_RNAseq_Data/MB01JHU504/MB01JHU504_000_analysis/RSEM/tables/genes/gene_expression_vst.rds")

./media/theron/My_Passport/scripts/Ali_data/analysis/bash/create_junc_files.sh

Mutation immunogenicity

## -Reading
## -Validating
## -Silent variants: 817928 
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 14.5s elapsed (24.5s cpu)

Mutation expression per sample

The majority of tumor-specific mutations are depleted in the combination therapy treatment. The combination therapy showed the greatest response. Therefore, we think that the tumor clones that contain immunogenic mutations are killed off in the combination therapy.

Here the average variance-stabilized gene expression of genes harboring mutations are shown for each treatment type. There is a significant difference in average between combination and untreated.

This tracks the variance-stabilized gene expression of specific mutation-harboring genes across treatment type.

A plot showing the variance-stabilized gene expression across treatment type. Each point on the x-axis is a mutation-harboring gene.

This plot shows how many mutation-harboring genes fall within a specific order of variance-stabilized gene expression across treatment type. The majority of mutation-harboring genes show depletion in combination therapy and higher expression in the pd1 and untreated arms.

This plot shows the distribution after subsetting to the “unique” set of genes. Genes can hardbor multiple mutations, affecting the immunogenicity score for that gene. No gene is repeated in this frequency plot.

The following three plots each show that there is no significant difference in immunogenic kmer binding strength across expression order type.

Variant Allele Frequency Heatmaps

The variant allele frequency (VAF) is calculated as follows: \(VAF=\frac{R_{v}}{R_{t}}\) where \(R_{v}\) is the number of reads in sample sequencing library that show expression of the variant and \(R_{t}\) is the total number of reads in the sample sequencing library that span the variant location. In the following heatmaps, any sample with no expression (\(R_{t}=0\)) associated with a variant is given a vaf value of -1. Across the heatmap 1 is added to the vaf values, scaling the vaf range from 0-2, where 0 indicates no expression, and 2 indicates complete expression of the variant. There are very few samples with mixed variant and non-variant expression. There are two sets of 3 heatmaps, the first unscaled and the second scaled across rows. Each set of figures is shown in three; one where both columns and rows are clustered, one where columns are not clustered and rows are clustered, and one where neither are clustered. There is no apparent relationship between variant allele frequency and expression order when looking at the heatmaps. The samples do not cluster by treatment type.

Variant Allele Frequency Across Expression Order

The following plots show the variant allele frequency across expression order groups for each sample. The first plot contains sample-mutation-gene points that do not have expression at the variant location. The second plot has these points filtered out. Of note, there are significant differences between the combo<pd1<untreated group and the untreated<pd1<combo and untreated<pd1<combo groups.

Variant Allele Frequency With Respect to binder count

The following plots show statistically significant, yet biologically irrelevant, negative correlation between binder count and variant allele frequency.

Variant Allele Frequency With Respect to binder count, vaf >=0

Splicemutr Immunogenicity

The splicing immunogenicity is calculated by identifying outlier splicing events between the combination and pd1 treatment groups. These outlier splicing events are then used to create transcripts which are translated, kmerized into 9mers, and run through mhc binding affinity prediction based on Balb/cJ mhc class 1 alleles. A gene splicing antigenicity metric is calculated as follows: \(G=\frac{\sum_{j \in J}\frac{R_{j}}{R_G}*k_{j}}{|J|}\) where \(R_{j}\) is the variance stabilized read count for the outlier splice-junction j, \(k_{j}\) is the number of immunogenic kmers predicted from the transcript associated with the junction j, \(R_{G}\) is the variance-stabilized read count for the specific gene G, and J is the set of junctions associated with the gene G. This metric serves as a weighted sum for the splicing-based immunogenic impact a gene has. Using this metric, calculated on combination and pd1 outlier junctions, we show that there are specific combination and pd1 immunogenecity signatures. We also show that overall, immunogenic scores are significantly lower in the combination associated genes than in the pd1 associated genes, supporting the idea of depletion of tumor clones with more immunogenic signatures.

calculating per-gene metric


./media/theron/My_Passport/scripts/Ali_data/analysis/bash/calc_gene_metric.sh

Splicemutr data